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BET-independent MLV-based Vectors Target Away From 
Promoters and Regulatory Elements 

Sara El Ashkar\ Jan De Rijck\ Jonas Demeulemeester\ Sofie Vets\ Paradise Madlala^ ^ Katerina Cermakova\ Zeger Debyser^ and 
RikGijsbers^'^ 

Stable integration in the host genome renders murine leukemia virus (MLV)-derived vectors attractive tools for gene therapy. 
Adverse events in otherwise successful clinical trials caused by proto-oncogene activation due to vector integration hamper 
their application. MLV and MLV-based vectors integrate near strong enhancers, active promoters, and transcription start sites 
(TSS) through specific interaction of MLV integrase (IN) with the bromodomain and extra-terminal (BET) family of proteins, 
accounting for insertional mutagenesis. We identified a BET-interaction motif in the C-terminal tail of MLV IN conserved among 
gammaretroviruses. By deletion of this motif or a single point mutation (IN^j^^J, BET-independent MLV(BinMLV)wereengineered. 
BinMLV vectors carrying IN^j^^^ integrate at wild-type efficiency, with an integration profile that no longer correlates with BET 
chromatin distribution nor with the traditional markers of MLV integration. In particular, BinMLV vector integration associated 
less with oncogene TSS compared to the MLV vectors currently used in clinical trials. Together, these findings open perspectives 
to increase the biosafety of gammaretroviral vectors for gene therapy. 
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Introduction 

Gene transfer vectors based on retroviruses have been used 
successfully in several gene transfer trials to treat genetic 
disorders, with clear signs of efficacy for more than 90% of 
the patients in clinical trials for primary immunodeficiencies.'' 
However, in 10% of the patients, insertional mutagenesis 
resulted in uncontrolled clonal proliferation. The first reported 
adverse events originated from vector integration in the 
proximity of the LM02 proto-oncogene promoter, resulting 
in aberrant LM02 expression and deregulated premalignant 
cell proliferation.^'^ Similar events were reported for integra- 
tions deregulating CCDN2, BMI1, and EVIL" The fact that 
these events are reported in several clinical trials involving 
the transplantation of stem cells genetically-corrected with 
retroviral vectors, indicates that insertional mutagenesis 
is not a mere theoretical event and highlights the impor- 
tance to identify the mechanisms underlying vector-induced 
genotoxicity. 

Stable insertion of the viral DNA into the host-cell genome 
is a key feature of the retroviral life cycle and part of the 
evolutionary strategy by which retroviruses maximize sur- 
vival and propagation, combining transmission of the viral 
genome to the host-cell progeny with persistent viral gene 
expression. Integration is a nonrandom process, with differ- 
ent integration site distribution patterns for each retroviral 
family. Lentiviruses (including the human immunodeficiency 
virus (HIV)-I) prefer integration into the body of actively tran- 
scribed genes, ^ while gammaretroviruses, such as murine 
leukemia virus (MLV), predominantly integrate in the vicin- 
ity of strong enhancers, transcription start sites (TSS), CpG 



islands, and DNasel-hypersensitive sites (DHS), account- 
ing for their integration close to proto-oncogene promoters 
and increased risk for mutational oncogenesis.""^^ Retroviral 
integration preference is dictated by host proteins that tether 
the viral preintegration complex to the chromatin. For HIV-1, 
the lens epithelium-derived growth factor (LEDGF/p75) is the 
dominant cellular cofactor.^^-^'' Via direct interaction with the 
lentiviral integrase (IN), LEDGF/p75 tethers the preinte- 
gration complex to the body of active genes explaining the 
integration bias of lentiviruses. '^'^^"^^ Recently, we and oth- 
ers reported that the bromodomain and extra-terminal (BET) 
family of proteins (BRD2, BRD3, and BRD4) interact with 
MLV integrase (IN) and target MLV integration. ^^-^^ 

As chromatin readers, BET proteins bind acetylated his- 
tones via tandem bromodomains. In addition to a role in tran- 
scriptional elongation, cell cycle progression and cancer,^^ 
we showed that BET proteins bind MLV IN and colocalize in 
the nucleus of the cell.^^ MLV integration site distribution cor- 
responds to the chromatin-binding profile of BET proteins, ^^'^" 
and inhibition of BET chromatin binding via bromodomain 
inhibitors or BET protein knockdown inhibited gammaret- 
rovirus replication, ^^"^^ targeting MLV integration away from 
TSS.^^ Stable expression of an artificial fusion that links the 
BRD4 ET domain with the LEDGF/p75 chromatin binding 
domain resulted in an integration site profile reminiscent of 
that of HIV, 2" underscoring BET proteins as main determinant 
for MLV integration site distribution. Apart from re-engineer- 
ing the cellular cofactor,^" redesigning MLV-based vectors 
to render them BET-independent is more interesting from a 
translational point of view. In this manuscript, we report the 
development of BET-independent MLV (BinMLV) vectors with 
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an altered integration profile, particularly showing diminished 
integration in the vicinity of markers for retroviral integration. 

Results 

The BET interaction domain locates to the unstructured 
C-terminal tail of IVILV IN and is dictated by W390 

To increase biosafety of retroviral vectors, we set out to 
develop cofactor-independent retroviral vectors. We previ- 
ously pinpointed the interaction between BET proteins and 
IVILV IN to the ET domain of BET proteins and the C-terminal 
tail of MLV IN (amino acids 381-408), respectively.^^ In this 
study, we further dissected and characterized the BET- 
MLV IN interaction. Alignment of the C-terminal tail (Ct) of 
gammaretroviral integrases (amino acids 381-408, IN^^^) 
revealed a conserved sequence (='5°W-X(3)-R/K-S/T-X(2)- 
PLK-I/L-R-I/L-X-R""^) exclusively found in gammaretrovi- 
ruses (Figure la) but absent in all other retroviral genera, 
such as a-, p-, A-, and e-retroviruses (data not shown). 
Of note, the IN^^ coding sequence overlaps with that of 
the envelope (Env) open reading frame (Supplementary 
Figure S1a,b). Contrary to IN^^, alignment of the A/-terminal 
tails of gammaretroviral Env proteins showed low conser- 
vation, underscoring that the sequence conservation in this 
region is due to selective pressure on this portion of the IN 
sequence. 

In a first step to further define the BET-interaction domain 
in MLV IN, we selectively mutated two patches of con- 
served amino acids into alanine in the MLV C-terminal tail 
peptide, fused /V-terminally to a glutathione-S-transferase 
(GST) tag, generating GST-IN,, „3,„^,3g,„,3g,^ and GST- 
INcLK4ooA/R402A/R405A. respoctively Protein integrity was con- 
firmed by SDS-PAGE (Supplementary Figure S1c). 
Interaction with His-tagged BRD4g.^ (His-BRD4g.^) was evalu- 
ated in an AlphaScreen assay, using GST-IN^,, as a control 
(Figure lb). While GST-IN^,, interacted with His-BRD4g^, 
as previously described, none of the triple alanine mutant 
C-terminal peptides interacted. Site-directed mutagenesis of 
each individual conserved amino acid showed that five out 
of six single mutants still interacted with His-BRD4^.^ (Figure 
1c,d), while binding of GST-IN^,, ^^^^^ was severely affected 
(Figure 1c). These data were corroborated in a similar 
AlphaScreen assay evaluating binding to maltose-binding 
protein (MBP)-tagged BRD2/3g.^ (Supplementary Figure 
S1d,e). Next, we introduced the W390A mutation into full- 
length MLV IN protein (IN^g^^^) and in parallel, deleted the 
C-terminal tail (dCt, containing a 27 amino acid C-terminal 
truncation) of MLV IN (IN^J. Both IN„3g„^ and IN^^,, lost inter- 
action with BRD4g.^ in an AlphaScreen assay (Figure 1e). 
These data were confirmed by coimmunoprecipitation of 
GFP-tagged BRD4 from 293T nuclear extracts transiently 
expressing flag-tagged IN„^, IN^3gg^, or IN^^, (data not shown). 
Gupta et al.^" reported that residues in the MLV IN catalytic 
core domain (CCD; E266, L268, and Y269) were important 
for the BET interaction. However, E266A, L268A, orY269A 
substitutions into full-length IN did not abolish the interaction 
with BRD4g.^ (Figure 1e). These results establish the C-ter- 
minal tail of MLV IN and more specifically IN^3gQ as a critical 
hot spot for the interaction with BET proteins. 



BET-independent IVILV (BinlVILV) vectors efficiently 
transduce cells 

Inspired by older reports that viruses with C-terminal trunca- 
tion of MLV IN are replication competent,^^'^'' we engineered 
MLV vector packaging plasmids carrying an IN with a trun- 
cated C-terminal tail (27 amino acids) or the single W390A 
mutation. VSV-G pseudotyped MLV-based vectors defective 
for BET-interaction and encoding an enhanced green fluo- 
rescent protein (eGFP) reporter, referred to as MLV,^ and 
MLV|^ ^^3g;,^, were produced in parallel with wild-type MLV vec- 
tor (MLV|^ ^j). Viral vector production efficiency was moni- 
tored by reverse transcriptase (RT) activity. While RT units 
for MLV|^j^ and ML\/^^_^^gg^ vectors were comparable (P > 
0.05), a sixfold lower RT activity was detected for MLV,,^ 
(P< 0.001), indicating that fewer viral particles were produced 
of MLV||^ ^j,, (Figure 2a). Following normalization for RT activ- 
ity, vector preparations were used to transduce SupTI cells. 
Transduction efficiency (% gated cells) was evaluated 2 days 
post-transduction for different vector dilutions. MLV,^ ,^3^^,^ 
transduced SupTI cells as efficient as MLV^^ (P > 0.01), 
while truncation of the C-terminal tail significantly reduced 
transduction efficiency of MLV,^ (P < 0.001), when com- 
pared to MLV,^ and MLV|^3,3g„^ (Figure 2b). These data 
were corroborated at 10 days post-transduction, underscor- 
ing stable vector integration and excluding expression from 
nonintegrated vector particles (Supplementary Figure S2a). 
Identical results were obtained upon transduction of primary 
human CD4+ T cells (Figure 2c). Vector integration in SupTI 
cells was quantified by quantitative-polymerase chain reaction 
(PCR) demonstrating comparable integrated copies for MLV,,^ 
„3g;,^ and MLV|j^ ^, whereas integrated copies were twofold 
lower for MLV,^ (P< 0.01 compared to MLV,^ ^^), which is in 
line with the transduction efficiency (Figure 2d). Interestingly, 
although transduction efficiency and integrated copies were 
comparable for MLV,,^ ^^^^^ and MLV,,^ in SupTI cells and 
CD4+ T cells, eGFP mean fluorescence intensity was 25% 
lower for MLV,^ ^^3^^^ and MLV.^^^, (Supplementary Figure 
S2b,c) (P< 0.01). This decrease of mean fluorescence inten- 
sity in both BET-independent conditions may be indicative of 
an altered integration site distribution resulting in lower gene 
expression. 

Loss of BET interaction does not affect the local IVILV 
integration site sequence 

We next asked whether truncation of the C-terminal tail of 
MLV IN or the introduction of W390A, resulted in redistribu- 
tion of integration sites. Integration sites were determined as 
described previously,^^ yielding 4896, 5906, and 4129 unique 
vector integration sites in SupTI cells for MLV,,^ 
^^3gQ^, and MLV|j^ respectively. In addition, we determined 
integration sites in primary human CD4+ T cells for MLV,,^ 
and MLV||^ ^^g^^, yielding and 1,831 and 485 unique integra- 
tions, respectively. Random control sites were generated 
computationally and matched to experimental sites (matched 
random control (MRC)). 

Retroviral INs show weak but discernable target sequence 
specificity at the local site of integration. In an effort to con- 
trol whether truncation or mutation of MLV IN influences 
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Figure 1 Characterization of the MLV IN-BET interface, (a) Schematic representation of MLV IN. The A/-terminal HHCC zinc-binding 
domain, the catalytic core domain (CCD) and the C-terminal domain (CTD) are indicated. The three residues in the CCD analyzed 
in this manuscript are indicated. The C-terminal (Ct) tail (indicated in gray) is aligned from different gammaretroviruses. Note that 
a consensus BET-interaction motif (='™W-X(3)-R/K-S/T-X(2)-PLK-I/L-R-I/L-X-R'"'^) is conserved among all gammaretroviruses. Protein 
sequences were downloaded from the UniProt database, aligned using t-coffee and manually refined. * indicates amino acids mutated 
in this manuscript, (b-d) Interaction of 20 nmol/l His-tagged mBRD4j,^ with increasing amounts of GST-tagged IVILV IN C-terminal tail 
(amino acids 381-408)(GST-IN(,,) or the indicated derived mutants as measured by AlphaScreen. (e) Interaction of increasing amounts 
of GST-tagged BRD4 ET with 80 nmol/l His-IN^^^ or the indicated mutants as measured by AlphaScreen. Representative experiments 
are shown. Error bars indicate the standard deviations of triplicate data points. AKR-MLV, AKR murine leukemia virus (P03356); BaEV, 
Baboon endogenous virus (PI 0272); Cas-BR-E, Cas-Br-E murine leukemia virus (P08361); en-FeLV, endogenous feline leukemia virus 
(PI 0273); GaLV, Gibbon ape leukemia virus (P21414); KoRV, Koala retrovirus (Q9TTC1); MLV, murine leukemia virus (P03355); PERV, 
Porcine endogenous retrovirus (Q8UM96). 



the consensus sequence flanking the integration site, we 
determined sequence logos (Supplementary Figure S3). 
The local integration site neighborhood remained unaf- 
fected, in agreement with the fact that IN binding to local 
target DNA is determined by IN-DNA interactions of the IN 
catalytic core. 



Loss of the BET interaction uncouples MLV integration 
from BET hot spots and traditional markers of IVILV 
integration 

We previously showed that IVILV integration sites and BET 
protein (BRD-2, -3, and -4) chromatin immunoprecipitation 
sequencing (ChlP-seq) tags in 293T cells tightly correlate 
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Figure 2 Transduction efficiency of BinMLV vectors, (a) Relative murine leul<emia virus (MLV) vector production determined by reverse 
transcriptase activity per ml of BinMLV vectors as measured by SYBRGreen-l product-enhanced reverse transcriptase assay (SG-PERT). 
RTU, reverse transcriptase units. Average values and standard deviations of triplicate measurements are shown. (b,c) Transduction of SupTI 
cells (b) and CD4* T cells (c) with equal RT-units of the indicated BinlVILV vectors expressing eGFR After 48 hours, the percentage of eGFP- 
positive cells was determined for the indicated vector dilutions. Average values and standard deviations of triplicate measurements are shown, 
(d) Normalized integrated proviral copies (to RNaseP) as determined via quantitative polymerase chain reaction in transduced SupTI cells 
at 10 days post-transduction. Representative experiments are shown. Error bars indicate the standard deviations of triplicate data points. 
Differences were determined using Student's f-test. **P < 0.01 , ***P < 0.001 . 



and concentrate around RefGene TSS with a comparable 
bimodal distribution. We binned BET protein chromatin 
occupancy (BET protein ChlP-seq tag densities) in a lOkb 
window around MLV,^ MLV,^ and MLV,^ integra- 
tion sites in SupTI cells (Figure 3). Whereas iVlLV,^ ^ inte- 
gration correlated with ChlP-seq read density for BRD-2, 
-3, and -4, densities around IVILV,^ ^^^^^ and iVILVi,^ sites 
were markedly lower. Similar results were obtained when 
assessing CD4+ T-cell BRD4 ChlP-seq tag frequencies^'' 
(Supplementary Figure S4a). Together, these results indi- 
cate that both iVILV,^ ^^^^^ and MLV,,^ no longer integrate 
via BET proteins. 

In line with previous reports,"'^" MLV,,^ integration in 
SupTI cells was enriched within a 2kb window around 
TSS (19.24%), CpG islands (18.67%), and DHS (43.53%) 



(P < 0.001 compared to MRC) (Figure 4a). Interestingly, 
for MLV|^ ^^^^^ and MLV,^ integration associated signifi- 
cantly less with these features (TSS 9.81-10.17%, CpG 
islands 9.47-9.91%, and DHS 31.62-32.48%, for MLV,^ 
^^3gg^ and MLV||^ respectively; P < 0.001 compared to 
MLV|^ (Figure 4a). Comparable data were obtained 
for larger window sizes (only 4kb is shown) (Figure 4a). 
For comparison, data sets were juxtaposed to integration 
site sets of other retroviruses, such as foamy virus^"''^ (FV) 
and HIV-1, indicating that uncoupling of BET-interaction for 
BinMLV vectors results in an integration site pattern that 
resembles that of FV for these features (Figure 4a). When 
integration sites were binned based on their distance to 
TSS, CpG islands, or DHS island midpoints (Figure 4b- 
d, respectively), the lack of BET interaction resulted in a 
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Figure 3 Loss of the BET-interaction uncouples BinMLV integration from BET hot spots. Mean background-subtracted ChlP-seq read 
density for BRD-2, -3, and -4 or the combination tliereof in 50 bp bins in a 10l<b window around IVILV,^ ^, MLV,^ mmi^' '^'-^in dct integration 
sites. 



shift of integration away from those features and toward 
a more random integration pattern (compare brown/green 
and red bars), phenocopying FV distribution (blue bars). 
Similar data were obtained when zooming in on oncogene 
TSS (Figure 4a, e). While 3.23% of MLV,^ integration 
sites landed within a 2kb window around oncogene TSS, 
only 1.93 and 2.08% of MLV,^^3^„^ and MLV,^^^, integra- 
tions occurred in this window, respectively (P < 0.001, 
compared to MLV,^ much alike FV distribution (1 .63%), 
whereas HIV disfavors TSS (0.75%), in line with previous 
reports.^ 

In addition, we analyzed integration preferences for 
MLV|^ W390A ^'^'^ '^'-^iN dct ''^ SupTI cells relative to a wide 
range of genomic features. MLV,^ ^^^^^ and MLV,^ showed 
a decreased frequency of integration in areas rich in CpG 
islands, DHS, and high in GC content, and overall distrib- 
uted more randomly (toward MRC) compared to MLV,^ 
(Figure 4f). Nonetheless, integration frequencies were still 
significantly different from MRC (P < 0.001 , data not shown). 
Likewise, we compared the density of integration sites with 



that of a set of histone modifications (acetylation/methyla- 
tion) and three chromatin-bound proteins (Pol II, H2AZ, and 
CTCF), mapped using chromatin Immunoprecipitation and 
Solexa sequencing (ChlP-Seq) (detailed information on 
these epigenetic marks and their roles can be found in ref. 
32,33). Compared to MLV,^ ^, BinlVILV integration occurred 
less frequent near sites marked for active transcription by 
epigenetic modifications acetylations, H3K4'^''\ H3K4'^''^, 
mKd'""' and H4K20'"'=\ bound RNA Pol II, or H2AZ (a his- 
tone variant associated with promoters; Figure 4g). Overall, 
BinMLV distributed more randomly (shifting toward MRC) 
compared to MLV,^ ^.^ whereas MLV.^ ^^gg^^ and MLV,^ 
integration site distributions were identical (Supplementary 
Figure S4b,c; no statistical difference between both integra- 
tion site data sets). 

Similar results were obtained when assessing MLV,,^ 
and MLV|^ ^^^^^ integration site distributions in primary 
human CD4+ T-cells (Supplementary Figure S5) and 
HeLa cells (data not shown). Whereas CD4+ T-cell BRD4 
ChlP-seq tag frequencies^** correlated with MLV,,^ this 
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association was lost for IVILV|^ ^3^^^ (Supplementary 
Figure S5a). In line with tiie data obtained in SupT1 
cells, MLV|^ ^^g^^ associated significantly less with TSS, 
CpG islands and DHS compared to MLV,^ „^ (p<0.001, 
Supplementary Figure S4b-e). Lil<ewise, MLV,^ ^^^^^ inte- 
gration shifted more toward random compared to MLV,,^ 
for a range of genomic features (Supplementary Figure 
S5f), and histone modifications^^^^ (Supplementary 
Figure S5g). 

Together, these findings demonstrate that BinMLV vectors 
do no longer integrate via BET proteins, leading to more ran- 
dom integration pattern, detargeted from traditional marl<ers 
of MLV integration, suggesting that BinlVILV vectors display 
a safer integration site profile, which opens perspectives 
for future translation of BinMLV vectors to gene therapeutic 
applications. 



Discussion 

Different animal models and several clinical trials underscored 
the potential of y-retroviral vectors for gene marl<ing and gene 
therapy. Unfortunately, further progress was thwarted with the 
advent of severe side effects, such as induced clonal domi- 
nance and malignant transformation, referred to as insertional 
mutagenesis.^'^ The main determinants of retroviral inser- 
tional oncogenesis are the integration site profile and trans- 
activation of neighboring genes by strong promoter/enhancer 
elements in the U3 region of retroviral LTRs.^^To reduce the 
potential of clonal proliferation, self-inactivating (SIN) vectors 
were developed^'^^'^ showing a lower tumorigenic potential in 
preclinical assays and gene-marl<ing studies. However, 
even SIN vectors are mutagenic, given that they contain a 
sufficiently strong internal promoter, albeit at decreased 
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incidences. Ul<ewise, SIN lentiviral vectors harboring pliysi- 
ological promoter/enliancers caused insertional dysregula- 
tion of cellular genes in erytinroid cells at iiigli frequencies.^" 
In addition, it should be taken into account that SIN architec- 
ture is potentially genotoxic, because integration may disrupt 
or impede open-reading frames or regulatory regions, since 
their integration profile is not different from traditional retrovi- 
ral vectors."""" 

Design of safer viral vectors for gene therapy requires 
mechanistic insight in the molecular mechanism of integration 
site selection. Retroviral integration is a nonrandom process, 
with different patterns of favored and disfavored target sites 
for each retroviral family. Interestingly, cluster analyses of 
different retroviruses based on their integration preferences 
mirrors phylogenetic trees based on the sequence similarity 
of their INs, and both are in good agreement with traditional 



trees based on genomic sequences.'*^ This strongly suggests 
a link between integration site selection and evolution, and 
puts forward integration site selection as part of the strategy 
by which retroviruses maximize their fitness. 

The integration pattern of retroviruses or retrotranspo- 
sons varies among different genera. While some members 
of theTyS retrotransposon lineage acquired a chromodomain 
at their integrase C-terminal end,"^ other retrotransposons 
interact with cellular proteins to establish chromatin target- 
ing. In this regard, the involvement of the TFIIIB component 
of the Pollll transcription apparatus in Ty3 retrotransposon 
targeting and Sir4p in Ty5 targeting were studied before.""'''' 
Lentiviruses are targeted to active transcription units via their 
interaction with LEDGF/pys.'"'"'^' "'' The latter protein does 
not bind the C-terminal end of lentiviral IN, but a cleft that is 
formed by the IN core dimer."** Current retargeting strategies 
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Figure 4 BinMLV vectors integration is targeted away from transcription start sites, (a) Murine leukemia virus (MLV)-based vector 
integration sites obtained from SupTI cells and their genomic distribution. Integration percentages in 2 and 4kb windows around TSS, CpG 
island midpoints, DHS, and oncogene TSS are listed. For comparison, foamy virus (FV) and HIV-1 data sets are included. P values (*) show 
significant departures (^***p< 0.001, pain/vise Fishers test) from MRC (not shown) and MLV,^ ^. (b-d) Integration frequencies for MLV|^ 
MLV|„ MLV,^ „3g„^, FV, and HIV-1 in 750 bp bins around TSS, CpG islands, and DHS in SupTI cells, (e) Integration frequency (%) of the 
indicated vectors in a 2 kb window around TSS of oncogenes (***p < 0.001 , pairwise Fishers test), (f ,g) Heat maps summarizing the relation 
between vector integration site frequency and genomic (e) or epigenetic (f) features in SupTI cells. Evaluated vectors are indicated above the 
columns. Features analyzed are shown to the left of the corresponding row of the heat map. Tile colors indicate whether a particular feature 
is favored or disfavored for integration of the respective data sets relative to their MRCs, as detailed in the colored ROC area scale at the 
bottom of the panel. P values (*) show significance of departures from MLV^,^ integration sites in SupTI cells (**P< 0.01 ; ***P< 0.001, Wald 
statistics referred to distribution). CpG, CpG-rich islands; DHS, DNase l-hypersensitive sites; TSS, transcription start sites. 
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are based on (transient) expression of alternative tethers by 
linking the C-terminal domain of LEDGF/p75 to heterologous 
chromatin binding domains such as CBXJ^" Although these 
methods allow retargeting, they are not readily applicable in 
a clinical gene therapy setting. 

In this manuscript, we developed BET-independent IVILV 
(BinMLV) vectors with wild-type transduction efficiency. The 
MLV IN-BET interaction is mediated by the BET ET domain 
and the unstructured IN C-terminal tail. We identified a motif 
(termed BET-interaction motif) in the C-terminal tail that is 
conserved in all gammaretroviral integrase proteins. Spe- 
cific searches in protein databases did not reveal other pro- 
teins harboring this full motif. Our in vitro analysis revealed 
that deletion of the C-terminal tail or introduction of W390A 
mutation in BET-interaction motif is sufficient to abrogate the 
BET interaction. In contrast, Gupta efa/.^" reported residues 
E266, L268, and Y269 in the IN core domain to be critical for 
BET interaction. Although our in vitro analysis did not reveal 
a significant effect of these residues in the direct interaction 
with BET proteins, we cannot exclude a role for the MLV IN 
core domain in BET binding in vivo. 

Analysis of the integration profile of MLV,,^ ^^^^^ and MLV,^ 
jj,, revealed that integration site distribution of BinMLV vectors 
is less associated with TSS, CpG islands and DHS as com- 
pared to MLV|^ ^. Of note, the integration profile of MLV,^ 
^333^ and MLV|^ are similar, indicating that the W390A 
mutant is as potent as the complete deletion of the C-ter- 
minal tail to abrogate the BET interaction in vivo. Although 
BinMLV integration is still significantly different from MRC for 
the large majority of genomic features (P< 0.001), the distri- 
bution shifts more toward random (Figure 4f). Possibly, the 
open chromatin surrounding TSS is more accessible to Bin- 
MLV vectors and other cellular or viral determinants. 

Earlier studies showed that a virus lacl<ing the IN C-ter- 
minal tail is viable, however this deletion resulted in a two- 
to fourfold decrease in IN catalytic activity in vitro.^'^^^^ Our 
results show that deletion of the C-terminal tail in the con- 
text of a vector results in a sixfold production and a twofold 
integration defect. On the contrary, transduction efficiency of 
MLV||^ ^ggg^ was in line with that of MLV,,^ opening perspec- 
tives for the use of BinMLV,^ ^^^^^ vectors in a clinical gene 
therapy setting. The decreased expression levels (mean fluo- 
rescence intensity. Supplementary Figure S2c,d) can be 
attributed to the more random integration pattern. 

Comparison of MLV,^ ^^3^^^ and MLV,^ Bin vector integra- 
tion sites to a set of epigenetic modifications revealed that 
overall integration is distributed more randomly and as such, 
is less associated with markers of active chromatin. Interest- 
ingly, we observed significantly less association with onco- 
gene TSS, a primary determinant for insertional mutagenesis. 
These results suggest that BinMLV vectors have superior 
properties with respect to the current MLV-derived vectors 
used in clinical trials in regard to their oncogenic potential. In a 
next step, the biosafety of these vectors should be determined. 

Material and methods 

Piasmids. All oligonucleotides used are listed in 
Supplementary Table S1. All enzymes were purchased 
from Fermentas (Thermo Scientific, St Leon-Rot, Germany). 



To generate the recombinant protein expression constructs 
for GST-INj,j and derived mutants, oligos 1 to 18 were 
annealed and ligated in BamHI/Xhol digested pGEX-6P-2 
(GE Healthcare, Diegem, Belgium). The plasmid encoding 
full-length HiSj.-tagged recombinant MLV IN (pKB-IN6H- 
MLV-IN) was kindly provided by 0. Johnson (Picathaway, 
NJ). To generate protein expression constructs for GST- 
tagged INy^3g„^ and IN^^,,, oligos 19-20 and oligos 21-22 were 
annealed and ligated into Xmal/Sall digested pKB-IN6H- 
MLV-IN, respectively. For clarity, W390A is the actual position 
in MLV IN that interacts with ET domain of BET proteins. 
However, when using recombinant proteins, an additional 
start codon (ATG, methionine) is included, shifting all IN 
amino acids one position. In our previous paper, only using 
recombinant proteins, we therefore referred to this position as 
W391A.^^ For simplicity we employ W390A throughout this 
manuscript. MLV IN mutations E266A, L268A, and Y269A in 
full-length MLV IN were introduced via site-directed, ligase- 
independent mutagenesis (SLIM) in pKB-IN6H-MLV-IN using 
oligos 23-30 as previously described. To create the His^,- 
Brd4g.^ expression construct, BRD4 ET was recombined 
from the pDONR221-BRD4g-r plasmid into pHXGWA using 
the Gateway system (Invitrogen, Merelbeke, Belgium). 
Expression constructs for MBP-tagged BRD-2 and -3^.^ were 
described earlier. The MLV vector packaging plasmid (pCgp 
608) was kindly provided by F.D. Busman (Philadelphia, NJ). 

The Gagpol reading frame from pCgp 608 was PGR ampli- 
fied with oligos 31 and 32, digested with EcoRI/Nhel and sub- 
cloned into EcoRI/Nhel digested peGFP-CI (pG1_608).The 
MLV IN W390A mutation and dCt truncation in pC1_608 were 
first introduced in a shuttle plasmid containing part of MLV 
Gagpol by SLIM mutagenesis using oligos 33-36 and 37-40, 
respectively. The resulting shuttle piasmids were digested 
with Bbvcl/EcoRI and the released fragments containing 
the mutation were cloned back into Bbvcl/EcoRI digested 
pG1_608 to create MLV,^ ^^^^^ and MLV,^ respectively. 
The integrity of all piasmids was verified by DISIA sequencing. 

Ceii culture. SupTI cells and CD4+ T-cells (ATCC CRL- 
1942) were cultured in Roswell Park Memorial Institutes 
medium (RPMI-1640, Gibco-BRL, Merelbeke, Belgium) 
supplemented with 10% heat inactivated fetal calf 
serum (Sigma-Aldrich, Bornem, Belgium) and gentamicin 
(50 \}g/vn\, Gibco-BRL). HeLa cells and 293T cells were 
cultured in Dulbecco's modified Eagle medium (Gibco- 
BRL) supplemented with 8% heat inactivated fetal calf 
serum and gentamicin. All cells are grown in a humidified 
atmosphere with 5% CO^ at 37 °C. 

T-celi purification. Peripheral blood mononuclear cells were 
purified from a buffy coat using density-gradient centrifugation 
(Lymphoprep; Axis-Shield PoG AS, Oslo, Norway). Primary 
CD4+ T cells were isolated using negative selection (MACS; 
Miltenyi Biotec, Leiden, the Netherlands) and stimulated with 
CD2, CD3, CD28 beads (MACS). 

Retroviral vector production. Viral vectors were produced 
as previously described.^^ Briefly, MLV-based vectors were 
produced by a triple PEI-based transfection of 293T cells 
with pVSV-G envelope, pC1_608 packaging plasmid or its 
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derived mutants (see above) and p450-GFP transfer plasmid 
(kindly provided by F.D. Bushman). Vector titer, represented 
as reverse transcriptase units, was determined by the 
SYBRGreen-l product-enhanced reverse transcriptase assay. 

Retroviral vector transduction. SupT1 cells (12x10''/well), 
CD4+ T-cells (20x10Vwell) and NIH3T3 or HeLa cells 
(2 X 1 0Vwell) were seeded in 96-well plates and subsequently 
transduced with a dilution series of the respective vectors. 
Forty-eight hours post-transduction, 50% of the cells were 
harvested for fluorescence-activated cell sorting analysis, 
while the remaining 50% were cultured for 10 days for 
fluorescence-activated cell sorting analysis, to determine 
integrated copies and to perform integration site analysis. 

gDNA isolation and quantitative PCR. Two million cells were 
pelleted and genomic DNA was extracted using a mammalian 
genomic DNA miniprep kit (Sigma-Aldrich). Genomic 
DNA concentrations were determined using standard 
spectrophotometric methods. Samples corresponding 
to 700 ng genomic DNA were used for analysis. Each 
reaction contained 12.5 |jl iQ Supermix (Biorad, Nazareth, 
Belgium), 40 nmol/l forward and reverse primer (oligo 41 
and 42 respectively) and 40 nmol/l of GFP probe (oligo 43) 
in a final volume of 25 ^jI. RNaseP was quantified as an 
endogenous control (TaqMan RNaseP control reagent. 
Applied Biosystems, The Netherlands). Samples were run in 
triplicate for 3 minutes at 95 °C followed by 50 cycles of 10 
seconds at 95 °G and 30 seconds at 55 °C in a LightCycler 
480 (Roche-applied-science, Vilvoorde, Belgium). Analysis 
was performed using the LightCycler 480 software supplied 
by the manufacturer. 

Protein purification. Escherichia coil BL21 chemically 
competent cells were transformed with prokaryotic expression 
constructs. Cultures were grown at 37 °C to OD 0.6 and 
induced with 1 mmol/l isopropyl P-D-l-thiogalactopyranoside 
at 30 °C for 3 hours for induction of the C-terminal tail and 
at 16 "C for 1 hour for full-length IVILV IN. Cell pellets were 
lysed in a lysis buffer (50 mmol/l Tris/HCI pH 7.3, 250 mmol/l 
NaCI, 1 mmol/l, PMSF, 5 mmol/l DTT, 10IU recombinant 
DNAse/IOmI lysate) and sonicated. The lysates were 
cleared by centrifugation at 15,000gfor 30 minutes. Proteins 
were purified on a column containing an appropriate affinity 
resin for the specific tag. Glutathione Sepharose (GE Life 
Sciences, Diegem, Belgium) and amylose resin (New 
England Biolabs, Leiden, Netherlands) were used for GST 
and MBP purifications, respectively Ni+-resin (Invitrogen) was 
used for HiSg-tagged proteins. The columns were washed in 
wash buffer (50 mmol/l Tris/HCI pH 7.3, 250 mmol/l NaCI, and 
5 mmol/l DTT). Subsequently, proteins were eluted in wash 
buffer supplemented with 50 mmol/l reduced glutathione for 
GST-purifications, 20 mmol/l maltose for MBP purifications or 
250 mmol/l imidazole for Ni+-purifications, respectively. Eluted 
proteins were dialyzed overnight with wash buffer containing 
1 0% glycerol and stored at -80 °C. 

AlphaScreen binding assay. AlphaScreen measurements 
were performed in a total volume of 25 \Ji\ in 384-well Optiwell 
microtiter plates (PerkinElmer, Zaventem, Belgium). All 



components were diluted to the desired concentrations in 
assay buffer (25 mmol/l Tris/HCI pH 7.4, 150 mmol/l NaCI, 1 
mmol/l MgCI^, 0.1 %Tween-20, 5 mmol/l DTT, and 0.1 % bovine 
serum albumin). The affinities of GST-IN^, and its respective 
mutants were determined against a fixed concentration of 
His-BRD4g^ (20 nmol/l) or MBP-BRD4gT. (2 nmol/l) while 
for HiSg-tagged full-length MLV-IN, its respective mutants or 
IN^j,|, a fixed concentration of 80 nmol/l was tested against 
dilution series of GST-BRD4g.^. After addition of the proteins, 
the plate was incubated for 1 hour at 4 °C. Subsequently, 20 
^ig/ml anti-GST or anti-MBP donor and Ni2-i-chelate acceptor 
beads (PerkinElmer) were added, bringing the final volume to 
25 |jl. After 1 hour incubation at 30 °C in the dark, the plate 
was read on an EnVision Multilabel Reader in AlphaScreen 
mode (PerkinElmer). Results were analyzed with Prism5.0 
(GraphPad software) after nonlinear regression with the 
appropriate equations (one-site specific binding). 

Recovery of integration sites and analysis of Integration site 
distributions. Recovery of integration sites was performed 
as previously described. Briefly, linkers were ligated to 
restriction enzyme-digested (Mse\) genomic DNA isolated 
from transduced cells and virus-host DNA junctions were 
amplified by nested PCR. Samples were individually barcoded 
with the second pair of PCR primers to generate 454 libraries. 
PCR products were purified and sequenced using 454/Roche 
pyrosequencing (Titanium Technology, Roche). Reads were 
quality-filtered by requiring perfect matches to the LTR linker, 
barcode, and flanking LTR and subsequently mapped to the 
human/mouse genome. All sites were required to align to the 
reference genome within 3 bp of the LTR edge. In order to 
control for possible biases in the datasets due to the choice 
of the Mse\ restriction endonuclease in cloning integration 
sites, MRC sites were generated in silico. To do so, each 
experimental integration site was paired with three sites in 
the genome, locating at the same distance from a randomly 
selected Msel site in the genome. 

Analyses were carried out as described.^** A detailed 
account of the statistical methods used and the methods for 
forming and analyzing heat maps using ROC curves can be 
found in 48. Consensus sequence analysis at the point of 
integration was performed using WebLogo (http://weblogo. 
Berkeley.edu/logo.cgi). For association with specific genomic 
features, the distance of each integration site (in kb) to the 
respective genomic feature was calculated (midpoint of 
the CpG island or DHS, and X5-end of genes as a measure 
for TSS). Integration sites left of the genomic feature were 
given negative kb values, while integration sites toward the 
right were calculated as positive. Subsequently, the integra- 
tion site data were pooled in bins ranging from 0 to 750 bp 
and increasing or decreasing with steps of 750 bp distance 
(-7,500 till 7500 bp window size around the respective 
genomic features). The percentage of integration sites occur- 
ring at a certain distance from the feature was plotted ver- 
sus the distance. For heat maps, comparisons were carried 
out over three different interval sizes surrounding each inte- 
gration site (5, 10, and 50 kb), since previous studies have 
shown that the interval sizes chosen for comparison can 
influence the conclusions. In this study, results were similar 
for each interval size examined (data not shown), so only the 
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data for lOkb intervals are shown. Results of statistical tests 
comparing the distributions of integration sites to the refer- 
ence dataset are summarized as asterisks on each tile of the 
heat map. Type I error was accounted for by subjecting all P 
values to Bonferroni correction. 

BET protein ChlP-seq data obtained in HEK293T cells and 
CD4+T cells were retrieved from the Gene Expression Omni- 
bus (accession codes GSM971 946-8 for BRD-2, -3, and -4 
in HEK293T cells, respectively and GSE33281 for BRD4 in 
GD4+ T cells). ^^'''^ Extended sequence read densities were 
determined in lOkb windows around MLV,,^ MLV,^ ^j,, 
MLV||^ ^ggg^ integration, or MRC sites respectively Densities 
were normalized for total sequencing depth; input control 
(GSM971951) was subtracted and results were plotted using 
R 3.0.1. 

Supplementary material 

Figure S1. Characterization of the MLV-BET interface. 
Figure S2. Transduction efficiency and mean fluorescence 
intensity for BinMLV vectors. 

Figure S3. Loss of the BET interaction does not affect MLV 
integration site neighborhood. 

Figure S4. Loss of the BET interaction uncouples MLV inte- 
gration from BET hot spots. 

Figure S5. Integration site distribution analysis for MLV,^ ^ 
and MLV|^ ^^gg^^ in human primary CD4-I-T cells. 
Table SI. Oligonucleotides used in this study. 
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